Assessing the visual speech perception of sampled-based talking heads

نویسندگان

  • Paula Dornhofer Paro Costa
  • José Mario De Martino
چکیده

Focusing on flexible applications for limited computing devices, this paper investigates the improvement on the visual speech perception obtained by the implicitly modeling of coarticulation on a sample-based talking head that is characterized by a compact image database and a morphing visemes synthesis strategy. Speech intelligibility tests were applied to assess the effectiveness of the proposed context-dependent visemes (CDV) model, comparing it to a simpler model that does not handle coarticulation. The results show that, when compared to the simpler model, the CDV approach improves speech intelligibility in situations in which the audio is degraded by noise. Moreover the CDV model achieves 80% to 90% of visual speech intelligibility of video of a real talker in the tested cases. Additionally, when the audio is heavily degraded by noise, the results suggest that the mechanisms that explain visual speech perception depends on the quality of the audible information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of visual speech perception of sampled-based talking heads: adults and children with and without developmental dyslexia

Among other applications, videorealistic talking heads are envisioned as a programmable tool to train skills which involve the observation of human face. This work presents partial results of a study conducted with adults and children with and without developmental dyslexia to evaluate the level of speech intelligibility provided by a sample-based talking head model in comparison with unimodal ...

متن کامل

Audio-Visual Prosody: Perception, Detection, and Synthesis of Prominence

In this chapter, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study a speech intelligibility experiment is conducted, where speech quality is acoustically degraded, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrow raising gestures. The experim...

متن کامل

Segmental optical phonetics for human and machine speech processing

That talkers produce optical as well as acoustic speech signals, and that perceivers process both types of signals has become well known. Although perceptual effects due to audiovisual speech integration have been a focus of research involving the visual speech stimulus, relatively little is known about visual-only speech perception and optical phonetic signals. This knowledge is needed to expl...

متن کامل

Audio-visual quality as combination of unimodal qualities: environmental effects on talking heads

Introduction Talking heads provide a multimodal output component for human-computer-interfaces. They consist of facial visual models that are synchronized with speech synthesis modules concerning speech articulation. Due to their reduction to a human head or upper body, articulation is often more clearly visible compared to a full human body due to the possibly bigger display of the head. There...

متن کامل

Cloning synthetic talking heads

The quality of Text-to-Visual-Speech synthesis is judged by how well it matches the visual perception of speech articulators with acoustic speech perception. Concurrently, di erent viewers often prefer di erent head models for subjective reasons. Traditional facial animation approach tied the parameterization of animation directly to the model. Switching the head model is di cult because a leng...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013